Clinical statistics for non-statisticians: Day one
Steve Simon
Start with a bad joke
Two statistics are sitting in a bar. One turns to the other and asks, “So, how do you like married life?”
The other statistic responds …
Put your reaction (“Ha ha”, “Groan”, etc.) in the chat box.
Introduction
Tell us one interesting number about yourself
Examples
8: I have traveled to eight countries outside the United States
(Canada, Italy, China, France, Russia, England, Holland, and Iceland)
29: I did not learn how to drive until I was 29 years old
1802: My highest chess rating was 1802, but I am not that good any more.
Your turn
A bit more about myself
PhD in Statistics in 1982 from the University of Iowa
Currently full professor
Part-time statistical consultant
Funded on 18 research grants
Over 100 peer-reviewed publications
Website with over 2,000 pages
Many invitations to talk at conferences
Outline of the three day course
Day one: Numerical summaries and data visualization
Day two: Hypothesis testing and sampling
Day three: Statistical tests to compare treatment to a control and regression models
My goal: help you to become a better consumer of statistics
Day one topics
Numerical summaries
When should you present the mean versus the median
When should you present the range versus standard deviation
How should you display percentages
Why should you round liberally
Day one topics (continued)
Data visualization
How should you display continuous data
Why is the normal bell-shaped curve important
How should you display categorical data
How do you illustrate trends and patterns
What are some common mistakes in the choice of colors
Quiz questions
Counting and proportions
Counts are the most common statistic
Counts are error prone
Counts require a solid operational definition
Student exercise
Count the number of occurrences of the letter “e”.
A quality control program is easiest
to implement from the top down.
Make sure that you understand the
the commitment of time and money
that is involved. Every workplace is
different, but think about allocating
10% of your time and 10% of the
time of all your employees to
quality control.
Counting sperm
Figure 1: Image of a haemocytometer
Tables of counts, using the Titanic data.
Figure 2: Counts of survival by gender
Percentages dividing by column totals
Figure 3: Column percentages
Percentages dividing by row totals
Row percentages
Percentages divided by grand total
Cell percentages
My recommendations
Treatment or exposure as rows
Outcome as columns
Usually report row percentages
Female mortality rate: 33%
Male mortality rate: 83%
But sometimes column percentages
Survivors: 68% female, 32% male
Some rationale for these choices
My way
Survived
No Yes
Sex Female 33% (154) 67% (308)
Male 83% (863) 17% (142)
Not my way
Sex
Female Male
Survived No 33% (154) 83% (863)
Yes 67% (308) 17% (142)
On your own
Calculate row and column percentages for the following tables. Interpret your results.
Figure 4: Titanic passenger class counts
Figure 5: Titanic child counts
The mean (average)
Figure 6: Cartoon image of Professor Mean
The median
Figure 7: Road with a median strip
Calculation of the mean and median
Mean
Add up all the values, divide by the sample size
Median
Sort the data
Select the middle value if n is odd
go halfway between the two middle values if n is even
When outliers/skewness might distort your conclusions
Often, either is fine
Criticisms of the mean and median
Are you combining apples and onions?
Are you ignoring minorities?
Use of the mean for ordinal data
Gould 1985
Figure 8: Gould 1985
Bridge 2001, PMID: 11405531
Figure 9: Bridge and McKenzie 2001
Bridge 2001, PMID: 11405531 (continued)
The measurement of airway resistance by the interrupter technique (Rint) needs standardization. Should measurements be made be during the expiratory or inspiratory phase of tidal breathing? In reported studies, the measurement of Rint has been calculated as the median or mean of a small number of values, is there an important difference?
Bridge 2001, PMID: 11405531 (continued)
In the present data the mean of a set of values contributing to a measurement was not significantly different from the median. However, the use of the median has been recommended since it is less affected by possible outlying values such as might be included by fully automated equipment.
Tosato 2021, PMID: 34352201
Figure 10: Tosato et al 2021
Tosato 2021, PMID: 34352201 (continued)
Symptom persistence weeks after laboratory-confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) clearance is a relatively common long-term complication of Coronavirus disease 2019 (COVID-19). Little is known about this phenomenon in older adults. The present study aimed at determining the prevalence of persistent symptoms among older COVID-19 survivors and identifying symptom patterns.
Tosato 2021, PMID: 34352201 (continued)
The mean age was 73.1 ± 6.2 years (median 72, interquartile range 27), and 63 (38.4%) were women. The average time elapsed from hospital discharge was 76.8 ± 20.3 days (range 25-109 days).
Ielapi 2021, PMID: 34968328
Figure 11: Tosato et al 2021
Ielapi 2021, PMID: 34968328 (continued)
Background. Insomnia is one of the major health problems related with a decrease in quality of life (QOL) and also in poor functioning in night-shift nurses, that also may negatively affect patients’ care. The aim of this study is to evaluate the prevalence of insomnia in night shift nurses.
Ielapi 2021, PMID: 34968328 (continued)
Excerpt from Table 1.
Data reported as mean ± standard deviation or median [Q1-Q3]
Overall (n = 2′355)
Age, years 40.4 ± 10.3
Months of work 168 [72–300]
Night shifts per month, number 6.3 ± 1.4
Time to reach workplace, minutes 45 [45–65]
Rest time, minutes 180 [4–240]
Rest in the afternoon, minutes 30 [0–120]
Number of coffees, mean 2.5 ± 1.5
Number of coffees during night shift, mean 1.4 ± 1.1
Chen 2019, PMID: 31806195
Figure 12: Chen et al 2019
Chen 2019, PMID: 31806195 (continued)
Background: The prices of newly approved cancer drugs have risen over the past decades. A key policy question is whether the clinical gains offered by these drugs in treating specific cancer indications justify the price increases.
Chen 2019, PMID: 31806195 (continued)
Results: We found that between 1995 and 2012, price increases outstripped median survival gains, a finding consistent with previous literature. Nevertheless, price per mean life-year gained increased at a considerably slower rate, suggesting that new drugs have been more effective in achieving longer-term survival. Between 2013 and 2017, price increases reflected equally large gains in median and mean survival, resulting in a flat profile for benefit-adjusted launch prices in recent years.
Percentiles
Figure 13: Illustration of the 75th percentile
Computing percentiles
Many formulas
Differences are not worth fighting over
My preference (pth quantile)
Sort the data
Calculate p*(n+1)
Is it a whole number?
Yes: Select that value, otherwise
No: Go halfway between
Special cases: p(n+1) < 1 or > n
Some examples of percentile calculations
Example for n=39
For 5th percentile, p(n+1)=2 -> 2nd smallest value
For 4th percentile, p(n+1)=1.6 -> halfway between two smallest values